Space-optimal Heavy Hitters with Strong Error Bounds Citation

نویسندگان

  • Radu Berinde
  • Graham Cormode
  • Piotr Indyk
  • Martin J. Strauss
چکیده

The problem of finding heavy hitters and approximating the frequencies of items is at the heart of many problems in data stream analysis. It has been observed that several proposed solutions to this problem can outperform their worst-case guarantees on real data. This leads to the question of whether some stronger bounds can be guaranteed. We answer this in the positive by showing that a class of “counter-based algorithms” (including the popular and very space-efficient FREQUENT and SPACESAVING algorithms) provide much stronger approximation guarantees than previously known. Specifically, we show that errors in the approximation of individual elements do not depend on the frequencies of the most frequent elements, but only on the frequency of the remaining “tail.” This shows that counter-based methods are the most spaceefficient (in fact, space-optimal) algorithms having this strong error bound. This tail guarantee allows these algorithms to solve the “sparse recovery” problem. Here, the goal is to recover a faithful representation of the vector of frequencies, f . We prove that using space O(k), the algorithms construct an approximation f∗ to the frequency vector f so that the L1 error ‖f − f∗‖1 is close to the best possible error minf ′ ‖f ′ − f‖1, where f ′ ranges over all vectors with at most k non-zero entries. This improves the previously best known space bound of about O(k log n) for streams without element deletions (where n is the size of the domain from which stream elements are drawn). Other consequences of the tail guarantees are results for skewed (Zipfian) data, and guarantees for accuracy of merging multiple summarized streams. ∗Supported in part by David and Lucille Packard Fellowship and by MADALGO (Center for Massive Data Algorithmics, funded by the Danish National Research Association) and by NSF grant CCF0728645. †Supported by NSF CAREER award CCF 0743372 and DARPA/ONR N66001-08-1-2065 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PODS’09, June 29–July 2, 2009, Providence, Rhode Island, USA. Copyright 2009 ACM 978-1-60558-553-6 /09/06 ...$5.00.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Optimal Algorithm for `1-Heavy Hitters in Insertion Streams and Related Problems

We give the first optimal bounds for returning the `1-heavy hitters in a data stream of insertions, together with their approximate frequencies, closing a long line of work on this problem. For a stream of m items in {1, 2, . . . , n} and parameters 0 < ε < φ 6 1, let fi denote the frequency of item i, i.e., the number of times item i occurs in the stream. With arbitrarily large constant probab...

متن کامل

Heavy Hitters and the Structure of Local Privacy

We present a new locally differentially private algorithm for the heavy hitters problem which achieves optimal worst-case error as a function of all standardly considered parameters. Prior work obtained error rates which depend optimally on the number of users, the size of the domain, and the privacy parameter, but depend sub-optimally on the failure probability. We strengthen existing lower bo...

متن کامل

On Low-Risk Heavy Hitters and Sparse Recovery Schemes

We study the heavy hitters and related sparse recovery problems in the low-failure probability regime. This regime is not well-understood, and has only been studied for non-adaptive schemes. The main previous work is on sparse recovery by Gilbert et al. (ICALP’13). We recognize an error in their analysis, improve their results, and contribute new non-adaptive and adaptive sparse recovery algori...

متن کامل

Practical Locally Private Heavy Hitters

We present new heavy-hitters algorithms satisfying local-differential-privacy, with optimal or nearoptimal worst-case error, running time, and memory. In our algorithms, the server running time is $\tilde O(n)$ and user running time is $\tilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STOC 2015] requiring $O(n^{5/2})$ server time and $O(n^{3/2})$ user tim...

متن کامل

Hierarchical Heavy Hitters with the Space Saving Algorithm

The Hierarchical Heavy Hitters problem extends the notion of frequent items to data arranged in a hierarchy. This problem has applications to network traffic monitoring, anomaly detection, and DDoS detection. We present a new streaming approximation algorithm for computing Hierarchical Heavy Hitters that has several advantages over previous algorithms. It improves on the worst-case time and spa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009